Assignment 5

### Loading libraries

library(gapminder)
library(tidyverse)
library(dplyr)
library(forcats)
library(ggplot2)
library(plotly)

Exercise 1

Exercise 2: Factor management

Task: Choose one dataset (of your choice) and a variable to explore. After ensuring the variable(s) you’re exploring are indeed factors, you should:

Drop factor / levels; Reorder levels based on knowledge from data.

Explore the effects of re-leveling a factor in a tibble by:

comparing the results of arrange on the original and re-leveled factor. Plotting a figure of before/after re-leveling the factor (make sure to assign the factor to an aesthetic of your choosing). These explorations should involve the data, the factor levels, and at least two figures (before and after.

2.1 Checking class and exploring drop function

First, let’s check if the variable of choice is a factor or not. #### Checking class

class(gapminder$country) #confirms that the country variable is a factor
## [1] "factor"
#levels(gapminder$country) : to see all the levels in the factor-country
nlevels(gapminder$country) # shows the number of countries in the dataset
## [1] 142

Now, let’s select all the countries in the continent of Asia and see how drop function can be useful. The former set of code evaluates without using drop function. When calculating number of countries in Asia we get the number of total countries in the database ie 142. The latter code uses drop function and gives the actual number of countries in Asia ie 33. This showcases the importance of using drop function while working with factors.

Without dropping

#Let's filter all the countries in the continent of Asia
Asia_country<- gapminder %>% 
  filter(continent == c("Asia"))
nlevels(Asia_country$country) # counting number of countries in Asia without using the drp
## [1] 142

With dropping

Asia_country$country %>% 
fct_drop() %>%  #droplevels() is the base r equivalent and can also be used. 
nlevels()
## [1] 33

2.2: Reordering

To show how reordering can be helpful, let’s consider one particular year. For this purpose, 2007 has been chosen and the corresponding lollipop plot has been plotted.

Lolliplot without re-ordering:

q<-Asia_country %>% 
  filter(year=="2007") %>% 
  ggplot(aes(x = country, y = lifeExp))+
  geom_segment(aes(x=country, xend=country, y=0, yend=lifeExp), color="skyblue")+
  geom_point(color="blue", size=3, alpha=0.5)+
  labs(
    x = "Country",
    y = "Life Expectancy")+
  theme_bw()+
  coord_flip()

q %>% 
  ggplotly()

Lolliplot after re-ordering:

p<-Asia_country %>% 
  filter(year=="2007") %>% 
  ggplot(aes(x = fct_reorder(country, lifeExp), y = lifeExp))+
  geom_segment(aes(x = fct_reorder(country,lifeExp), xend = fct_reorder(country, lifeExp), y=0, yend=lifeExp), color="skyblue")+
  geom_point(color="blue", size=3, alpha=0.5)+
  theme_bw()+
   labs(
    x = "Country",
    y = "Life Expectancy")+
  coord_flip()
p %>%   
ggplotly()

Exercise 3

Task: Experiment with at least one of:

write_csv()/read_csv() (and/or TSV friends), saveRDS()/readRDS(), dput()/dget().

Let’s look at the incremental increase in life expectancy of each country in Asia and Africa from the initial year recorded. The function first() is used to calculate incremental increase. The resulting data is stored in a csv file called gapminder_inclife.csv by using the here() function.

Further, the data is read back using here() function and explored using

gapminder_inclife<- gapminder %>%
  group_by(country) %>% 
  arrange(year) %>% 
  mutate(incr_lifeexp = lifeExp-first(lifeExp)) %>% 
  arrange(country)%>% 
  filter(continent==c("Asia","Africa"))
DT::datatable(gapminder_inclife)
write.csv(gapminder_inclife, here::here("HW05","gapminder_inclife.csv"))

read.csv(here::here("HW05","gapminder_inclife.csv")) %>% 
  DT::datatable()
##fct_reorder2(gapminder_inclife$continent, gapminder_inclife$gdpPercap,min)
fct_count(fct_drop((gapminder_inclife$continent)))
## # A tibble: 2 x 2
##   f          n
##   <fct>  <int>
## 1 Africa   312
## 2 Asia     198

Same number of entries(510) and same variables can be observed in the initital tibble and the one after exporting to the system and re-reading it. ## Exercise 4

Task: Create a side-by-side plot and juxtapose your first attempt (show the original figure as-is) with a revised attempt after some time spent working on it and implementing principles of effective plotting principles. Comment and reflect on the differences.

For the assignment three, I had plotted GDP per capita vs life expectancy in Asia. Here is the before-visualization lecture:

gapminder %>% 
  filter(continent=='Asia') %>%
  group_by(year) %>% 
  arrange(year) %>% 
  ggplot(aes(gdpPercap,lifeExp, size=pop/1000000))+
  geom_point(alpha=0.7, shape=21, color="purple")+
  labs(title="GDP per cap vs Life expectancy in Asia",
       x="GDP per capita",
       y="Life Expectancy")

Now, let’s see ‘after’ the visualization lecture:

p<-gapminder %>% 
  filter(continent=='Asia') %>%
  group_by(year) %>% 
  arrange(year) %>% 
  ggplot(aes(gdpPercap,lifeExp, size=pop/1000000))+
  geom_point(alpha=0.3, shape=21, color="purple")+
  labs(title="GDP per cap vs Life expectancy in Asia",
       x="GDP per capita",
       y="Life Expectancy")
  ggplotly(p)

Exercise 5